Remote control signaling using audio watermarks

ABSTRACT

A system for using a watermark embedded in an audio signal to remotely control a device. Various devices such as toys, computers, and appliances, equipped with an appropriate detector, detect the hidden signals, which can trigger an action, or change a state of the device. The watermarks can be used with a “time gate” device, where detection of the watermark opens a time interval within which a user is allowed to perform an action, such as pressing a button, typing in an answer, turning a key in a lock, etc. To prevent fraudulent activation of a time gate, the time gate device can be configured to react only to watermarks coming from live broadcasts, and not from replays from tapes or other storage devices. In another feature, robustness of the watermark is improved, e.g., for acoustic propagation channels, by shifting the detection time of the watermark based on a measured bit error count of the watermark. Furthermore, the watermark may be inserted before the desired action along with corresponding offset information if the audio signal is not suitable at the time of the action.

[0001] This application is a divisional of co-pending, commonly assignedU.S. patent application Ser. No. 09/505,080 filed on Feb. 16, 2000.

BACKGROUND OF THE INVENTION

[0002] The present invention relates to a method and apparatus forremotely controlling a device, such as a toy, lock, smart card, or homeappliance, via a control message that is imperceptibly embedded in anaudio signal, e.g., as a “watermark”. Moreover, the invention optionallyenables the device to be synchronized with the audio signal, forexample, so that the actions of a doll can be synchronized with achildren's television program.

[0003] Audio signals are ubiquitous, being broadcast over AM/FM radio,TV, public announcement systems, transmitted over telephone channels, orstored on cassette tapes, CDs, computer memories, etc. Therefore, it isconvenient to use audio channels or audio storage to transmit or storesome other information.

[0004] Audio watermarking, or embedded signaling, has recently emergedas a technology for embedding auxiliary data imperceptibly in a hostaudio signal. A basic feature of audio watermarking techniques is thatthe embedded signal is substantially imperceptible to a listener of thehost signal. Furthermore, the audio-watermarks occupy the sametime/frequency/space domain as the host signal, so that they are notlost in standard audio signal processing, recording or transmissions,nor filtering and/or masking operations in a deliberate attack canremove them.

[0005] A primary proposed use of watermarking is in protectingintellectual property rights, e.g., through copy control, automaticbroadcast monitoring, ownership dispute resolution, Internet commercetracking, etc. Alternative applications include auxiliary dataembedding, such as the song title and purchasing instructions, assuranceof content integrity, proof of performance in TV and radioadvertisements, audience exposure monitoring, caller identification(authentication) in telephone conversations, or generic covertcommunication.

[0006] Moreover, various schemes have been proposed for sending commandand control signals, or their equivalent, concurrently with audiosignals. However, these schemes do not qualify as audio watermarkingtechniques. For example, in one proposed scheme, an “instructionalsignal” is inserted in a narrow frequency band set aside at the upperfrequency edge of the audio spectrum. However, this system does notqualify as a watermarking system since the host and the control signalsoccupy distinct frequency bands.

[0007] In another proposed scheme, a unique code describing an offer forproducts and services is transmitted by a TV program as an audible“beep”. There is no attempt to hide this beep, so this technique also isnot audio watermarking.

[0008] In yet another proposed scheme, information related to a TV gameshow is encoded in touch tones and broadcast in-band with an audioportion of the show. The touch-tones can be masked by the show's usualsound effects, such as buzzers and beeps. This is substantiallydifferent from the watermarking approach, because it cannotsimultaneously meet the inaudibility requirement and the requirement forthe time domain overlap of a watermark and an arbitrary audio signal.

[0009] Accordingly, it would be desirable to provide a watermarkingsystem for sending command and control signals concurrently with audiosignals that overcomes the disadvantages of the existing proposedschemes.

[0010] The system should use watermarking techniques to provide a hiddendata channel in an audio signal for providing short messages, such asdevice activation commands, or remote control signals that can changethe state of a device.

[0011] The system should be compatible with existing watermakingtechniques, such as those disclosed in U.S. Pat. No. 5,940,135 toPetrovic at al., entitled “Apparatus and Method for Encoding andDecoding Information in Analog Signals,” issued Aug. 17, 1999, andincorporated herein by reference.

[0012] The system should provide a hidden remote control signal as awatermark within an audio signal for controlling various devices thatdetect the hidden signal.

[0013] The system should allow the remote control signal to be relatedto, or independent of, content of the host audio signal. For relatedcontent, the system should optionally provide synchronization of theremote control signal with the host audio signal content.

[0014] The system should use a watermark to define a time gate (window)during which a device is enabled to receive a user input or perform aspecified action.

[0015] The system should provide a security mechanism to ensure that thetime gate is defined only from a real-time broadcast audio signal, andnot from a replay of the audio signal.

[0016] The system should improve the robustness and temporal resolutionof a watermark, and provide a simplified watermark detector.

[0017] The system should provide synchronization of a watermark encoderand decoder.

[0018] The present invention provides a system having the above andother advantages.

SUMMARY OF THE INVENTION

[0019] The present invention relates to a system for using a watermarkembedded in an audio signal to remotely control a device.

[0020] In particular, the system is compatible with existingaudio-watermarking technologies that use audio channels and/or audiostorage to carry independent data without interfering with the audiochannel's original purpose. However, such a channel has much lowerinformation capacity than a modem channel, typically no more than abouttwenty bits per second per audio channel. The invention uses this hiddendata channel for relatively short messages, such as device activationcommands, or remote control signals that can change state of a device.

[0021] A remote control signal is hidden within an audio signal that isbroadcast over radio and TV, stored on CDs, DVD, tape or computermemory, played over speakers and/or transmitted over other audiochannels. Various devices such as toys, computers, and appliances, thatare equipped with an appropriate detector, detect the hidden signal totrigger an action, or change a state of the device. The device actioncan be completely unrelated to the ongoing audio content, and it canhave a number of different objectives, such as entertainment, education,sales, security, etc.

[0022] In one particular implementation, a “time gate” device isdisclosed, where detection of the watermark opens a time interval withinwhich a user is allowed to perform an action, such as pressing a button,typing in an answer, turning a key in a lock, etc. To prevent fraudulentactivation of a time gate, the time gate device can be further upgradedto react only to watermarks coming from live broadcasts, and not fromreplays from tapes or other storage devices.

[0023] In another implementation, detection of the watermark triggers anaction.

[0024] Additionally, techniques are presented for improving existingwatermarking technology in view of requirements for the proposedapplications. In particular, the invention provides improvements inrobustness of the watermark in the channels with acoustic propagation(e.g., propagation through air)—using delay hopping (watermarkingadjacent bits using distinct autocorrelation delays), robustnessimprovements using redundant watermarking, improvements in the timeresolution of the trigger feature, and simplifications of the detectordesign.

BRIEF DESCRIPTION OF THE DRAWINGS

[0025]FIG. 1 illustrates an audio watermarking process in accordancewith the present invention.

[0026]FIG. 2 illustrates a system for remote control of a device, suchas a toy, in synchronism with audio data, such as from a televisionprogram, in accordance with the present invention.

[0027]FIG. 3(a) illustrates a time gate defined by start and stopwatermarks in accordance with the present invention.

[0028]FIG. 3(b) illustrates a time gate defined by a start watermark anda fixed interval τ in accordance with the present invention.

[0029]FIG. 3(c) illustrates a time gate defined by a start watermark anda multiple N of a fixed interval τ′ in accordance with the presentinvention.

[0030]FIG. 4 illustrates a real-time time gate application in accordancewith the present invention.

[0031]FIG. 5 illustrates the use of countdown watermarks for defining astart time of a desired action in accordance with the present invention.

[0032]FIG. 6 illustrates an autocorrelation modulation extractor basedon sign correlation in accordance with the present invention.

[0033]FIG. 7 illustrates an example of a bit error count versus a timeshift for detecting a watermark in accordance with the presentinvention.

DETAILED DESCRIPTION OF THE INVENTION

[0034] The present invention relates to a system for using a watermarkembedded in an audio signal to remotely control a device.

[0035]FIG. 1 illustrates an audio watermarking process in accordancewith the present invention.

[0036] At an encoding side 100, a watermark, i.e., an embedded signal,is inserted into an audio signal at an embeddor 105, using a key, whichis a set of parameters that define the hiding process. The key maycomprise a steganographic key. The composite signal that is output fromthe embeddor 105 can be recorded, transmitted over various channels, orprocessed in different ways, which usually includes corruption by noiseand distortion.

[0037] The composite signal is received at a decoding side 150, wherethe embedded signal (watermark) is retrieved from the composite signalin an extractor 155 with the help of the key used in the embeddingprocess.

[0038] Various details regarding conventional signal processingtechniques, such as compression, coding, error-correction, modulation,and the like, are not explicitly disclosed but their appropriate usewill be evident to those skilled in the art.

[0039] The embeddor 105 may provide the watermark in the audio signalusing various known watermarking techniques, including those discussedin the aforementioned U.S. Pat. No. 5,940,135, where the short-termautocorrelation function of the host audio signal is modulated in such away to match the embedded signal. The key contains the information aboutthe frequency band of the host signal used for hiding, the delay setused for autocorrelation calculation and its change patterns, basebandsymbol waveform, data packet structure, scrambling key, etc. With thissystem, the extractor 155 calculates the short-term autocorrelationusing the same key, and regenerates the inserted message with the helpof standard digital signaling techniques.

[0040] Typical bit rates of the embedded messages are low, ranging froma few bits per second to a few tens of bits per second. For example, theInternational Federation of the Phonograph Industry required 20 bps perchannel for its review of watermarking technologies. Those skilled inthe art can appreciate that the increase in the bit rate brings reducedrobustness, increased audibility and/or increased complexity. Therefore,only relatively short messages can be hidden within an audio signal.This is quite suitable for intellectual property protection, where copycontrol information and/or content identification codes of less thanone-hundred bits are embedded.

[0041] However, the present invention proposes the use of audiowatermarks for remote control of various devices, such as toys, locks,smart cards, appliances, etc., over standard audio channels, such asradio and TV broadcasts, audio tapes, CDs, telephone channels, publicaddress systems, etc. As an example, we will describe a system forremote control of toys participating in a TV show, as illustrated onFIG. 2.

[0042]FIG. 2 illustrates a system for remote control of a device 290,such as a toy, in synchronism with audio data, such as from a children'stelevision program, in accordance with the present invention.

[0043] The control messages are inserted using audio watermarking atappropriate places of the show's audio track using an embeddor 205 and asynchronization device 210. The composite audio signal is optionallystored, e.g., on a tape 215, and subsequently broadcast via atransmitter 220 and antenna 225. While antennas 225 and 230 are shown asan illustration, any type of broadcast scheme can be used, includingdelivery via a terrestrial path, cable, optical fiber, and/or computernetwork. Moreover, broadcast of the composite signal to a largepopulation of receivers is not required, as the invention is alsosuitable for any transmission, including point-to-point transmissions,video-on-demand transmissions in a cable television network, and soforth. Moreover, the composite audio signal may be played back locallyat a user's home from a storage device such as a tape or disc.

[0044] A TV receiver 235 receives the signal via an antenna 230 and theaudio signal is played over a TV speaker 240. The toy 290, speciallydesigned in accordance with the present invention, includes components291, including a built-in microphone 292 for picking up (detecting) theaudio signal, a watermark extractor 294 for extracting the watermarksfrom the composite audio signal, and a control 295 that is responsive tothe watermarks for performing some action.

[0045] The toy 290 may optionally be hard-wired to the receiver usingappropriate jacks and wiring, in which case the microphone 292 is notneeded.

[0046] A motor 296, audio function 297, such as a speech synthesizer,and lights function 298 are responsive to the control 295. For example,if the toy 290 is a doll, the audio function 297 may play a prerecordedmessage in concert with the ongoing show. The motor 296 may cause thetoy's 290 arms, legs, head and/or mouth to move. The lights function 298may cause the toy's eyes to light up. This creates the appealingimpression that the toy is actually following the show together with thechildren, and that it participates.

[0047] Advantageously, no modification is required for the TV channel,including the TV signal storage equipment, satellite distributionchannels, broadcast equipment, and TV sets. Additional equipmentconsists only of an embeddor, specially designed for precise watermarkinsertion at a desired segment of the audio track, and a mass-produced,inexpensive detector incorporated in a suitably-designed toy andconnected to the toy controller 295. Also, note that the same toy 290can be activated by audio watermarks coming from any audio source, suchas an AM/FM radio broadcast, CD or tape player, or speakers wired to acomputer.

[0048] An important feature in the previous example is thesynchronization of the action of a toy (or toys) with the ongoing show.To achieve this, the watermark should be embedded in the audio tracksegment immediately preceding the desired moment of the toy action, witha small allowance for processing and propagation delays. Thissynchronizing feature can be useful in many other practicalapplications.

[0049] Time Gate

[0050] For example, we will describe here a device suitable forwatermark-based activation that we will call a “time gate” (e.g.,window). In accordance with the invention, the detection of a watermarkcauses the time gate to open a time interval, during which a user isallowed to perform an action. For example, during an interactive TV quizshow, the viewers may participate by keying in their answers to ahand-held unit, while the players in the television studio prepare theiranswers. This can be achieved if the audio track has appropriatelyinserted watermarks, and if a time gate device synchronized to thewatermarks controls access to the hand-held unit.

[0051] Such a hand-held unit can have similar componentry as the toy 290of FIG. 2. The control 295 can be configured to send a message to theuser to alert the user that the time gate has started or ended, theduration of the time gate, and the amount of time remaining in the timegate. An output screen on the hand-held unit, such as a liquid-crystaldisplay (LCD), may be appropriate for this purpose. Or, the hand-heldunit may send a signal back to the television receiver 235 to provide adisplay on a TV or computer screen that informs the user of theinformation provided by the watermark (e.g., via a wired path, infra-redsignal or the like).

[0052] Alternative applications including an automated audio/video exam,where the time gate defines a period during which the user can enterresponses for the exam, alertness monitoring, where the user is requiredto provide an input during the time gate, TV coupon collection, whereelectronic coupons can be retrieved by a user during the time gate,remote control of a lock, where the lock can be opened or closed onlyduring the time gate, and so forth, are discussed below.

[0053] Three different designs of the time gate protocol are illustratedin FIGS. 3(a)-3(c).

[0054]FIG. 3(a) illustrates a time gate defined by start and stopwatermarks in accordance with the present invention.

[0055] This design uses two distinct watermarks, including a startwatermark 305 and a stop watermark 310, to mark the desired beginningand the end, respectively, of the gated interval 315 bounded by T₁ andT₂. Note that the boundaries of the interval 315 are shown as occurringslightly after the end of the corresponding watermarks, due toprocessing and propagation delays. This design can mark an arbitrarytime interval that is larger than the duration of the stop watermark310, since the duration of the stop watermark 310 consumes a portion ofthe time gate 315.

[0056]FIG. 3(b) illustrates a time gate defined by a start watermark anda fixed interval τ in accordance with the present invention.

[0057] This time gate protocol design is simpler than that of FIG. 3(a)since it requires a single start watermark 330 before the beginning ofthe marked interval 335. The duration of the marked interval (timegate), τ, is predefined, e.g., at the time the device, such as the toy290 in FIG. 2, is manufactured. However, the interval may bere-programmable (e.g., by replacing a memory chip). This design issuitable if the user action is simple, like pressing a button, orturning a key in a lock.

[0058]FIG. 3(c) illustrates a time gate defined by a start watermark anda multiple N of a fixed interval τ′ in accordance with the presentinvention.

[0059] This time gate design requires a single watermark 350, but stillcan mark a variable time interval. This is achieved by inserting aninteger number N into the watermark's data field, where the interval 360is N times a predefined time slot duration τ′.

[0060] The value N can be carried in a separate or same watermark asthat of the control data, or even provided beforehand. A preferredsolution is for the same watermark 350 to include both N and the controldata (for reasons of efficiency, potential for false action, etc.)

[0061]FIG. 4 illustrates a real-time time gate application in accordancewith the present invention.

[0062] Many time gate applications allow pre-processing of the audiotrack in a similar manner as depicted in FIG. 1, e.g., for toyapplications. For example, the time gate device can be used in aneducational process to monitor alertness of the student. In this case, apre-recorded lecture (video/audio, or audio only) contains a watermarkedcontent that opens a time gate interval after questions/instructionsthat require the student's response, such as pressing yes/no buttons onthe time gate device. The device may score correct/incorrect answers forreview by a teacher, but we expect major educational benefits by simplyconfirming that the student was paying attention to the audio signal.

[0063] Similarly, an advertisement company may poll viewers about anadvertised product, and engage their attention. Active participation ofusers is expected to bring better recognition of products in a shortertime, and lower advertisement costs. In particular, as an encouragementfor participation, an advertiser may offer discounts to those who bringthe time gate device to a specified retail store, where the user'sanswers to the poll questions are downloading from the device. This isequivalent to bringing cut-out coupons from a paper advertisement.Again, the watermarks are pre-stored in the audio track of a TV or radioadds.

[0064] However, in other applications, the embedding of the watermarkshould occur in real-time, while the audio signal is being transmitted.Such a case occurs when the time gate is used, for instance, to controla user's access to a secure area or safe box, or to override otherrestrictions. For example, an innocuous public address systemannouncement or background music may be used with an electronic lock toallow a key to operate within the defined time interval. Similarly, auser may remotely control a door lock in the user's residence by callinghome over a telephone line, and speaking to the telephone answeringmachine (with its speaker on). The user's telephone set includes anencoder that embeds watermarks with control data into the user's voice.When the voice is received at a decoder with a watermark extractor at anelectronic door lock mechanism, the lock can be activated.

[0065] In a related example, background music or a public addressannouncement in a larger facility such as an office may be used as ahost audio signal for watermarked control data to remotely lock orunlock doors, filing cabinets and so forth without requiring re-wiringof the facility.

[0066] With these application, it is important that there is nodetectable communication channel between the user and the controlleddevice, so that potential attackers are oblivious to the controlmechanism. Furthermore, if an attacker does learn the operationalprinciple of the lock, he/she should not be able to gain access to thesecure area by recording and subsequently replaying the audio signalwith the embedded control data.

[0067] Even if an attacker knows the operational principle of the lock,he/she cannot forge the message if he/she does not know the secret keyused in the embedding process, as shown in FIG. 1.

[0068] The real-time time gate system shown in FIG. 4 addresses theseconcerns. An encoding side 400 includes an audio source 405, an embeddor410, a clock 415 and a control data encoder 420. A receiving side 470that receives the real-time transmission of the composite audio signalincludes an extractor 475, a logic function 480, and a clock 485.

[0069] The embeddor 410 receives a continuous audio signal from theaudio source 405, and also receives a string of messages from the dataencoder 420. The messages include the control data, as well as timinginformation from the local clock 415, with possibly some additional data(e.g., number of slots for slotted time gate). The message embedding istriggered from outside, e.g., by an operator pressing a button on theembeddor 410. The message is inserted into the audio data stream using asecret key. In the real-time operation, the message is immediatelytransmitted, i.e., there is no recording of the audio signal except fora short buffering in the embeddor 410 necessary for the embeddingprocess (up to a few tens of milliseconds). The output composite signalis transmitted over a standard audio channel (telephone, radio, TV,public announcement system, etc). The receiver 470 detects the signaland passes it to the extractor 475, which detects the message and passesit to the logic function 480 for verification.

[0070] The logic function 480 compares the timing data from the incomingmessage to locally-generated timing data from the clock 485. If thematch is sufficiently close, the logic function 480 concludes that areal-time transmission has occurred, and the control information takenfrom the received message is passed, e.g., starting the time gate.Simultaneously, the local clock 485 may be adjusted (e.g., synchronizedwith the clock 415) if the discrepancy with the transmitter clock 415 iswithin predefined bounds. However, if the time discrepancy is too large(e.g., the local time is significantly after the transmitted time), itimplies that recording and play back may be taking place, so the messageis ignored.

[0071] If the local time is significantly before the transmitted time,this implies a significant mis-calibration, and a default mode can beinvoked to ignore the message.

[0072] Clearly, potential attacks based on storing and replayingmessages would introduce large delays, certainly more than a fewseconds. On the other hand, propagation and processing delays arecertainly less than a second, so there should be a clear separation ofthe two cases. Also, typical clock devices can maintain a timing errorwell below a couple of seconds for a quite long time, so that the timedrift of local clocks can be also distinguished from thestore-and-replay attack. Occasional re-synchronization of the receiverclock with the transmitter's clock should keep the timing mismatchbetween the send and receive clocks within tight bounds indefinitely.This re-synchronization can be achieved using known techniques.

[0073] Watermark Design Considerations

[0074] The applications described above impose somewhat different setsof requirements on the audio watermark design than do the intellectualproperty protection applications. For example, most of the applicationsdescribed above imply an acoustic (free-space) propagation channel,while the property rights establishment is usually done on an electronicform of the audio signal. The use of an acoustic propagation channel isespecially challenging since it generates intersymbol interference dueto multipath, signals may be corrupted with background noise, and thedistortions in speakers and microphones are usually large compared toelectronic channel distortions.

[0075] Furthermore, some of the above applications, such as the toyapplications, require very inexpensive designs for the watermarkextractors. On the other hand, in some other applications, such asremote control of locks, it is equally important to have an inexpensiveembeddor as well, e.g., for the telephone set example. Finally, the timegate device requires careful consideration of timing tolerances toachieve the best possible resolution in time domain, as opposed totypical copyright protection applications, where the location of thewatermark within a signal is relatively unimportant.

[0076] With the above requirements in mind, the present inventors lookedat different watermarking technologies, and found that the best-suitedtechnology is the autocorrelation modulation (ACM) technique describedin the aforementioned U.S. Pat. No. 5,940,135. However, any othersuitable watermarking/data embedding technique may be used.

[0077] ACM features a simple design for the embeddor and extractor, highrobustness, large throughput, low probability of falses, good layeringcapability, and full inaudibility. However, the present inventionprovides improvements and the special selection of design parameters tooptimize ACM for the proposed applications. Herein, several techniquesare disclosed that can substantially improve the performance of ACM inremote control signaling using audio watermarks.

[0078] The acoustic-coupling environment, where a speaker broadcasts anaudio signal, and the detector captures it through a microphone, can beimproved by a special watermark design that is not addressed byconventional techniques. The main issue is the multipath propagationcaused by reflections of acoustic waves, which may introduce intersymbolinterference to the watermark detector. Standard techniques to fightintersymbol interference, such as adaptive equalization, are too costlyfor an inexpensive detector. An increase in the bit interval is helpful,but with obvious drawback in the reduction of the watermark bit rate.

[0079] To avoid intersymbol interference, we propose watermarkingadjacent bits using distinct autocorrelation delays (“delay hopping”).In effect, distinct autocorrelation delays can be considered as distinctchannels, with little interference between them. This aspect of theinvention increases the watermark robustness, which is particularlyuseful in the acoustic-coupling environment, but may be used in otherenvironments as well. That is, if consecutive symbols are sent overdistinct channels, they cannot cause intersymbol interference,regardless of the pulse broadening caused by a multipath environment.

[0080] In a further aspect of the invention, robustness of thewatermarking is improved by first evaluating the masking ability of thehost audio signal before embedding the watermark. In a typical scenario,the device activation, time gate opening, or other actions occur upondetection of the corresponding watermark. This means that the encodershould insert the watermark immediately prior to the desired moment,taking into account propagation and processing delays. However, thedesired (candidate) watermark insertion interval may be unsuitable, forexample, if it is mainly a silence. Accordingly, the watermark can beinserted before the desired instant of action, along with informationfor informing the decoder about the delay between the watermarkdetection and the desired action, which corresponds to the delay betweenthe time segment in which the watermark is embedded and the originaldesired time segment.

[0081] There is a tradeoff between the flexibility in choosing theoptimum watermark insertion time and the amount of the payload (bits)assigned to the delay information.

[0082]FIG. 5 illustrates the use of countdown watermarks for defining astart time of a desired action in accordance with the present invention.

[0083] In a further alternative embodiment, a string of watermarks 510is embedded before the desired start of action (T₁) . Each watermark,such as watermarks 512, 514 and 516, has a countdown data field (n=2, 1,0) that indicates the number of remaining watermarks before (T₁).Detection of any of the watermarks in the string 510 allows thecalculation of the desired timing of the action. For example, if thecountdown field of a particular watermark contains the countdown fieldvalue n, and the watermark duration is w seconds, than the desiredaction should begin n*w seconds after the particular watermark isdetected, plus some additional propagation and processing delay.

[0084] This provides improved robustness since the start time of thedesired action is designated with redundance. Thus, even if all but oneof the watermarks are not received correctly, the start time will bestill be designated.

[0085]FIG. 6 illustrates an autocorrelation modulation extractor basedon sign correlation in accordance with the present invention.

[0086] The simple decoder design described in the aforementioned U.S.Pat. No. 5,940,135 includes a filter, followed by a delay line,correlator (multiplier), and an integrator. The output of the integratorrepresents a base-band watermark signal without normalization. When abinary message has been embedded, the output is positive at the decisionmoment if a “one” bit is embedded, or negative for a “zero” bit. In thisdesign, it is important to maintain a very precise delay in the delayline; small errors in the delay can bring significant distortion of thewatermark signal. Those skilled in the art know that precise delays arebest achieved by a digital delay line. This means that an A/D converteris necessary prior to the delay line, which add to the cost of thedecoder.

[0087] In accordance with the present invention, the decoder of U.S.Pat. No. 5,940,135 can be further simplified as shown in FIG. 6 to meeteven lower cost demands, suitable for toy applications, withoutcompromising the delay precision.

[0088] This decoder 600 includes a filter 610 that receives thecomposite signal with the watermark, e.g., from a channel or a storagedevice, and a comparator 620 for comparing the filtered signal to ground630. An AND gate 640 receives the output of the comparator 620, and alocal clock signal. An XNOR (exclusive-NOR) gate 660 receives a directoutput of the AND gate 640, as well as a shifted version of the outputvia shift register 650. The output of the XNOR gate 660 is provided to acounter 670, which communicates with a logic function 680.

[0089] Thus, instead of multiplying the received signal with a delayedversion of itself, it is possible to detect the signal polarity, andthen perform an XNOR operation between the signal and a delayed versionof itself (i.e., if the signs are the same, the output is one, ifopposite, the output is zero). Then, instead of integration, it isenough to run the counter 670 at a clock rate that is much higher thanthe bit rate for the duration of the watermark bit. If the count at theend of the bit interval is more than half the maximum count, then a“one” bit is detected; otherwise a “zero” bit is detected. Asynchronizing algorithm, residing in the logic block 680, determines thebeginning and the end of the bit interval, and generates a reset signalfor the counter at the end of each bit.

[0090] The above simplification shows that the comparator substitutesfor an A/D converter, the XNOR gate 660 replaces a multiplicator, andthe counter 670 substitutes for an integrator. Moreover, the presentinventors have confirmed through experimentation that the proposedsimplification does not significantly reduce robustness.

[0091]FIG. 7 illustrates an example of a bit error count versus a timeshift for detecting a watermark in accordance with the presentinvention.

[0092] In the time gate applications in particular, but in othertriggering applications as well, it is important to achieve a goodtiming precision in detecting the end of a watermark. In the case ofwatermarks described in U.S. Pat. No. 5,940,135, this is equivalent todetecting the trailing edge of the last bit of the watermark bit stream.However, noise and channel distortions can corrupt the trailing edge ofany bit, or the bit as a whole. Therefore, it is necessary to take thewatermark in its totality to decide the most probable timing of the endof the watermark.

[0093] In typical digital watermarks described in U.S. Pat. No.5,940,135, error correction codes are used to recover watermarks in thecase when some bits are corrupted. In this case, the watermark can bedetected at a time interval that is slightly earlier than its optimalposition, but with a higher error count than in its optimal position.The present invention modifies this scheme propose to use this featureto further optimize the watermark timing detection.

[0094] In accordance with the present invention, the decoder willattempt to detect the watermark with a starting and ending position thatare delayed slightly with respect to the timing position where initialdetection of the watermark occurred. The time shifts are very small withrespect to the watermark duration, typically of the order of 5% of thebit interval (T_(bit)) (each watermark contains tens or even hundreds ofbits). In each of these shifts, the decoder will continue to detect thesame watermark and monitor the bit error count over the duration of thewatermark. The same watermark is detectable even with slight timeshifts, which is the basis of the precise time resolution disclosedherein. The optimum timing is found where the error count is minimized.For multiple minimums, the optimum timing can be taken at the midpoint.

[0095] For example, in FIG. 7, the error count changes as the detectingtime is shifted. Initially, with no time shift (i.e., at theoriginally-detected position), the watermark is detected with four bitsin error (point 710). However, as the detecting time is shifted in stepsof ts (e.g., ts=T_(bit)/20), the error count decreases (points 715 and720), reaches a minimum for two shifts (points 725 and 730), and thenincreases (points 735, 740 and 745).

[0096] A minimum bit error (points 725 and 730) is reached for twoconsecutive steps, and we conclude that the best estimate for areference position of the watermark (which can be, e.g., the ending timeof the watermark) is at the mid-point of these two events, i.e., at(ts₃+ts₄)/2. Alternatively, either ts₃ or ts₄ could be selected. Theresolution of the extractor's clock will govern the minimum possibletime shift.

[0097] If only one minimum is found, the optimum position is taken atthat time shift position.

[0098] To reduce computations, it is possible to terminate the bit errorcalculations once a minimum has been detected and the bit error countbegins to rise (e.g., at point 735). Also, optionally, the bit errorcalculations can be terminated when a bit error count of zero is firstreached.

[0099] As an numeric example, assume the watermark message is twoseconds long, and T_(bit)=0.08 sec. Let us say that watermark is firstdetected in the audio signal starting at time 12.340 sec. and ending attime 14.340 sec. This corresponds to a zero time shift. Then, withts1=0.08/20=0.004 sec., the watermark is next detected starting at time12.344 sec., and ending at time 14.344 sec., and so forth. Note that thedetection interval is 2 sec., and the position shift, and theresolution, is 0.004 sec.

[0100] Moreover, for a strong watermark, the error count may reach zeroor some other minimum value for several time shifts. In this case, theoptimum timing is again set in the middle of the time intervals with theminimum errors.

[0101] Experiments show that this technique detects the watermark withthe precision of +/−20% of the bit interval. For example, for watermarksrunning at a 50 bit/sec. rate, this corresponds to a time resolution of+/−4 ms. This resolution corresponds to +/−1.4 meters of acousticpropagation delay, so it is adequate for the systems based on acousticcoupling.

[0102] Accordingly, it can be seen that the present invention provides amethod and apparatus with various advantages, including:

[0103] allows remote control of a device, such as a toy, lock, smartcard, or home appliance, via a control message that is imperceptiblyembedded in an audio signal as a watermark;

[0104] is compatible with, and builds upon, existing watermarkingtechniques;

[0105] allows the remote control signal to be synchronized with theaudio content, such as to allow a toy to move in conjunction with theaudio track of a children's television program;

[0106] uses a watermark to define a time gate (window) during which adevice is enabled to receive a user input;

[0107] provides a security mechanism to ensure that the time gate isdefined only from a real-time broadcast audio signal, and not from areplay of the audio signal;

[0108] provides a simplified watermark detector;

[0109] improves the robustness and temporal resolution of a watermark,e.g., in acoustic propagation channels, by shifting the detecting pointbased on an error count of the watermark; and

[0110] improves robustness by evaluating the masking ability of the hostaudio signal when embedding the watermark, and inserting the watermarkin a suitable interval before the desired action, along with informationindicating the time shift, if necessary.

[0111] Although the invention has been described in connection withvarious specific embodiments, those skilled in the art will appreciatethat numerous adaptations and modifications may be made thereto withoutdeparting from the spirit and scope of the invention as set forth in theclaims.

What is claimed is:
 1. A method for robustly embedding data in an audiosignal, comprising the step of: embedding at least a first message as acorresponding first watermark in an audio source signal to provide acomposite audio signal by modifying the audio source signal so that datasymbols of the first message are determined according to correspondingvalues of an autocorrelation function of the composite audio signal;wherein: the values of the autocorrelation function are determined usingdelays that differ for adjacent data symbols; and the first watermark iscarried substantially inaudibly in the composite audio signal.
 2. Themethod of claim 1, wherein: each autocorrelation value is calculatedover a time interval of a corresponding data symbol.
 3. A method forrecovering data that is robustly embedded in an audio signal, comprisingthe steps of: (a) receiving a composite audio signal; wherein: at anencoder, at least a first message is embedded as a corresponding firstwatermark in an audio source signal to provide the composite audiosignal by modifying the audio source signal so that data symbols of thefirst message are determined according to corresponding values of anautocorrelation function of the composite audio signal; the values ofthe autocorrelation function are determined using delays that differ foradjacent data symbols; and the first watermark is carried substantiallyinaudibly in the composite audio signal; and (b) recovering the firstmessage from the composite audio signal.
 4. The method of claim 3,wherein: each autocorrelation value is calculated over a time intervalof a corresponding data symbol.
 5. A method for robustly embedding datain an audio signal, comprising the steps of: determining a first,candidate time segment of an audio source signal for embedding at leasta first message as a corresponding first watermark to provide acomposite audio signal; evaluating the audio source signal to determineif the first time segment is suitable for masking the first message; andif the first time segment is found to be unsuitable: (a) selectinganother, second time segment of the audio source signal that is suitablefor masking the first message; and (b) embedding the first message inthe second time segment along with information indicative of a timeshift between the first and second time segments.
 6. The method of claim5, wherein: the second time segment precedes the first time segment. 7.A method for recovering embedded data from a composite audio signal,comprising the steps of: (a) receiving the composite audio signal;wherein: at least a first message is embedded as a corresponding firstwatermark in a first time segment of the composite audio signal, alongwith time shift information indicative of a time shift between the firsttime segment and another, second time segment; and an audio sourcesignal of the composite audio signal is suitable for masking the firstmessage during the first time segment, but is unsuitable for masking thefirst message during the second time segment; (b) recovering the firstmessage and the time shift information from the first time segment ofthe composite audio signal; (c) determining when the time shift haselapsed in accordance with the time shift information; and (d) providinga control signal for controlling a device after the time shift haselapsed in accordance with said determining step.
 8. The method of claim7, wherein: the second time segment precedes the first time segment. 9.A method for decoding embedded data in an audio signal, comprising thestep of: (a) receiving a composite audio signal; wherein data symbols ofat least a first message are embedded as a corresponding first watermarkin an audio source signal to provide the composite audio signal, and thefirst watermark is carried substantially inaudibly in the compositeaudio signal; the audio source signal is modified so that data symbolsof the first message are determined according to corresponding values ofan autocorrelation function of the composite audio signal; and eachautocorrelation value is calculated over a time interval of acorresponding data symbol; and (b) recovering the embedded data symbolsfrom the received composite audio signal according to polarities of thecomposite audio signal.
 10. A decoding method for improving the timeresolution of the position of a message that is embedded in an audiosignal, and encoded according to an error-correcting code, comprisingthe steps of: (a) receiving the audio signal; (b) determining an initialposition in the audio signal at which the message is initially detected;(c) determining a bit error count for the message at each of a pluralityof positions, including the initial position and at least one otherposition that is shifted relative to the initial position; (d)determining an optimum position at which it is most probable that themessage is embedded in the audio signal in accordance with the bit errorcounts; and (e) recovering the message at the optimum position.
 11. Themethod of claim 10, wherein: the optimum position is determinedaccording to the position that yields a bit error count that is aminimum among the plurality of positions.
 12. The method of claim 10,wherein: the optimum position is determined according to a mid-point ofthe positions that yield bit error counts that are minimums among theplurality of positions.